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Abstract. We generalise the procedure for joint estimation of cosmo- 
logical parameters to allow freedom in the relative weights of various 
probes. This is done by including in the joint Likelihood function a set 
of 'Hyper-Parameters', which are dealt with using Bayesian considera- 
tions. The resulting algorithm is simple to implement. We illustrate 
the method by estimating the Hubble constant H$ from the recent Cos- 
mic Microwave Background experiments Boomerang and Maxima. For 
an assumed flat A-CDM model with fixed parameters (n = l,O m = 
1 — A = 0.3, Qfyh 2 = 0.03, (5 rm s = 18/j,K) we solve for a single parameter, 
Hq = 79 ±4 km/sec/Mpc (95 % CL, random errors only), slightly higher 
but still consistent with recent results from Cepheids. We discuss how 
the 'Hyper-Parameters' approach can be generalised for a combination of 
cosmic probes, and for other priors on the Hyper-Parameters. 



1. Introduction 

Several groups (e.g. Gawiser &: Silk 1998; Webster et al. 1998; Lineweaver 1998; 
Eisenstein, Hu & Tegmark 1999; Efstathiou et al. 1999; Bridle et al. 1999, 
2000; Bahcall et al. 1999) have recently discussed the estimation of cosmological 
parameters by joint analysis of data sets, e.g. Cosmic Microwave Background 
(CMB), SNe la, redshift surveys, cluster abundance and peculiar velocities. 

While joint Likelihood analyses employing both CMB and LSS data are al- 
lowing more accurate estimates of cosmological parameters, they involve various 
subtle statistical issues: 



• The choice of the model parameter space is somewhat arbitrary. 

• One commonly solves for the probability for the data given a model (e.g. 
using a Likelihood function), while in the Bayesian framework this should 
be modified by the prior for the model. 

• If one is interested in a small set of parameters, should one marginalise over 
all the remaining parameters, rather than fix them at certain (somewhat 
ad-hoc) values ? 

• The 'topology' of the Likelihood contours may not be simple. It is help- 
ful when the Likelihood contours of different probes 'cross' each other to 
yield a global maximum (e.g. in the case of CMB and SNe), but in other 
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cases they may yield distinct separate 'mountains', and the joint maximum 
Likelihood may lie in a 'valley'. 

• Different probes might be spatially correlated, i.e. not necessarily inde- 
pendent. 

• What weight should one give to each data set ? 

The above points have been discussed in many papers in the cosmological 
literature and also at this conference. Here we focus on the last point. A con- 
ventional approach does not take into account the fact that different systematics 
may affect each data set. The problem arises when data sets are inconsistent 
with one another. One approach is to combine inconsistent data sets in the 
hope that the various systematic effects will tend to cancel out. However, this 
may lead to problems if all of parameter space is ruled out by one data set or 
another. The orthogonal approach is to choose, somewhat ad-hoc, a mutually, 
consistent group of data sets to combine. Lahav et al. (2000; hereafter L2000) 
presented a more objective method for dealing with disagreement between data 
sets by utilising 'Hyper Parameters' (hereafter HPs). Some previous approaches 
to this problem of assigning the relative weights of different measurements have 
been suggested in the astronomical literature (e.g. Godwin & Lynden-Bell 1987; 
Press 1996). 

The derivation of HPs is given in Section 2. In Section 3 we apply the 
method to the recent Boomerang and Maxima CMB experiments, and we esti- 
mate the best fit Hubble constant (Ho = WOh km/sec/Mpc). Extensions of the 
methods are discussed in Section 4. 

2. 'Hyper-Parameters' 

Assume that we have two independent data sets, Da and Db (with Na and 
Nb data points respectively) and that we wish to determine a vector of free 
parameters w (such as the density parameter Q m , the Hubble constant Ho etc.). 
This is commonly done by minimising 



(or, more generally, maximizing the product of Likelihood functions). 

Such procedures assume that the quoted observational random errors can 
be trusted, and that the two (or more) x 2 s have equal weights. However, when 
combining 'apples and oranges' one may wish to allow freedom in the relative 
weights. One possible approach is to generalise Eq. 1 to be 



where a and (3 are 'Hyper-Parameters', which are to be dealt with the following 
Bayesian way. There are a number of ways to interpret the meaning of the HPs. 
One way is to understand a and (3 as controlling the relative weight of the two 
data sets. It is not uncommon that astronomers accept and discard measure- 
ments (e.g. by assigning a = 1 and f3 = 0) in an ad-hoc way. The procedure 
proposed by L2000 gives an objective diagnostic as to which measurements are 
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problematic and deserve further understanding of systematic or random errors. 
A simple example of the HPs is the case that 

XA = XI ^2 [ X obs,j - £prcd,i(w)] 2 , (3) 

where the sum is over N A measurements and corresponding predictions and 
errors <7j . Hence by multiplying \ 2 by a we may interpret each error as effectively 
becoming oT x l 2 (Ji. 

How do we eliminate the unknown HPs a and ? L2000 followed the 
Bayesian formalism given (in other contexts) in Gull (1989), MacKay (1992), 
Bishop (1995) and Sivia (1996). By marginalisation over a and we can write 
the probability for the parameters w given the data: 

P(w\D A , D B )= J J P(w, a, 0\D A , D B ) da dp . (4) 

Using Bayes' theorem we can write the following relations: 

or Am n \ P(D A ,D B \w,a,0) P(w,a,0) 
P(v,,a,0\D A ,D B ) = P(Da , Db) • ( 5 ) 

and 

P(vr,a,0)=P(w\a,0)P(a,0). (6) 
We now make the following assumptions: 

P(D A ,D B \w,a,0) =P(D A \w,a) P(D B \w,0) , (7) 

P(w\a,(3) = const. , (8) 

P{a,P)=P(a)P(0). (9) 

With the choice of 'non-informative' uniform priors in the log, P(ln a) = P(ln 0) = 
1 we get P(a) = 1/a and P{0) = 1/0 (Jeffreys 1939). Note that the integral 
over priors of this kind diverges (such a prior is called 'improper', see Bishop 
1995). These are very conservative priors, essentially stating that we are igno- 
rant about the scale of measurements and errors. The other extreme is obviously 
P(a) = 5{a — 1), i.e. when the measurements and errors are taken faithfully. 
One can try other forms (see below), but it is likely that these two extreme 
forms reasonably bracket the probability space. Hence: 

P(w\D A ,D B ) = p{d 1 a Db) P{D a \w) P(D B \w), (10) 



where 



and 



P{D A \w) = J P(D A \w,a)a- 1 da , (11) 
P(D B \w) = J P{D B \™,p)0- 1 d0 . (12) 
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It is common to have a likelihood function of the form of a Gaussian in Na 
dimensions: 

P G (D A \w) oc exp[- X i/2] , (13) 

where we assume for simplicity that the normalization constant is independent 
of the parameters w (this is indeed the case in our application for the CMB 
measurements in the next Section). 

We generalise this form to incorporate a as follows: 

P(D A \w,a) oc a N ^ 2 exp(-| X i) . (14) 

The integral of Eq. 11 then gives 

P(D A \w) oc ( X a)- Na , (15) 

and similarly for Eq. 12. We note that it is the specific choice of prior for 
P(a) = l/a that has led to a change from a Gaussian distribution (Eq. 13) to 
a power-law (Eq. 15). Eq. 10 can then be written (ignoring constants) as 

-2 lnP(w\D A ,D B ) =N A ln( XA ) + N B ln( X 2 B ) . (16) 

To find the best fit parameters w requires us to minimise the above probability in 
the w space. Note that in this case our method is equivalent to assuming that we 
are ignorant of the relative scale of the errors in each experiment. It is as easy to 
calculate this statistic as the standard \ 2 - Eq. 16 actually generalises a similar 
equation derived by Cash (1979) using an entirely different set of assumptions. 

Since a and (5 have been eliminated from the analysis by marginalisation 
they do not have particular values that can be quoted. Rather, each value of 
a and (3 has been considered and weighted according to the probability of the 
data given the model. However, it may be useful to know which values of a and 
(3 were given the most weight. This can be estimated by finding the values of a 
and (5 at which Eq. 14 peaks: 

"eflf = ~y , (17) 
Xa 

and similarly 

Aff = ^ , (18) 

X B 

both evaluated at the joint peak. We note that if we substitute these effective 
a and (3 in Eq. 2 we obtain xf om t = ^ A 

There is of course freedom in choosing the prior. For example, if we take 
P(a) = 1 (instead of Jeffreys' prior P{a) = l/a) we find that the function to 
be minimised is 

-2 \nP(w\D A ,D B ) = (N A + 2)Mx A ) + (N B + 2)ln( X 2 B ) (19) 

instead of Eq. 16. Thus these two priors give very similar results for large 
./V4. Numerous other priors are possible (e.g. a top-hat centred on a plausible 
value), but at the expense of more free HPs (e.g. the width of the top-hat). 
Illustrations of the HPs approach applied to toy-models are given in Bridle 
(2000), and another application of the above HPs (to galaxy cluster data) is 
given in Diego et al. (2000). 
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3. Application to the CMB Data 

3.1. The Boomerang and Maxima Data 

The recent Boomerang (hereafter B; de Bernardis et al. 2000) and Maxima 
(hereafter M; Hanany et al. 2000) CMB anisotropy measurements yielded high- 
quality angular power spectra C\ over the spherical harmonics 400 ;$ / ^ 800. 
An important factor in interpreting the data is the calibration error. The ex- 
perimental papers quote calibration errors of 10% and 4% (1-sigma in AT/T) 
for B and M, respectively. The measurements (with B data corrected upward 
by 10%, and M data corrected downward by 4 %) are shown in Figure 1, and 
they indicate a well defined first acoustic peak at / ~ 200, with less convincing 
second and third peaks at higher harmonics. These measurements favour (under 
certain assumptions) a flat universe, spectral index n = 1 and baryon density 
n b h 2 ~ 0.03 (e.g. Jaffe et al. 2000; Bond et al. 2000; Bridle, this volume), 
which is about 2-sigma higher than the Big-Bang Nucleosynthesis (BBN) value 
n h h 2 ~ 0.0190 ±0.0018 (95 % CL; Buries et al. 2000). Note that the recent CBI 
result (Padin et al. 2000) gives a higher power (at / ~ 600) relative to B&M. 
Jaffe et al. (2000) fitted models after combining the B& M data sets into one 
set. Here we take a different approach for joint analysis of the two data sets by 
utilising the 'Hyper-Parameters'. 

3.2. Results 

We illustrate the effect of using HPs by application to measurements of the an- 
gular power spectrum of the Cosmic Microwave Background (CMB). Numerous 
groups have now used CMB data to estimate cosmological parameters. The 
most common method is the flat bandpower method (Bond 1995) in which the 
difference between observed and predicted flat bandpowers are compared using 
the x 2 statistic (Eq. 3). We note that non-zero correlations between the CMB 
data points can make the data points look more smooth which, since the theo- 
retical model is smooth on this scale, will tend to improve the apparent goodness 
of fit to the model and thus inappropriately give more weight to correlated data 
points. We also note that the assumption that the Likelihood function is a 
Gaussian is only an approximation (Douspis et al. 2000). 

L2000 applied the HPs approach to the pre-B&M CMB data sets, in differ- 
ent combinations. Here we apply the method to the recent high-quality B&M 
data, first in their 'raw' form and then in their calibrated form. For simplicity, 
we restrict ourselves to a very limited set of cosmological models. We obtain 
theoretical CMB power-spectra using the CMBFAST and CAMB codes (Sle- 
jak & Zaldarriaga 1996; Lewis, Challinor & Lasenby 2000). We assume that 
CMB fluctuations arise from adiabatic initial conditions with Cold Dark Mat- 
ter (CDM) and negligible tensor component, in a flat Universe with J7 m = 0.3, 
A = 1 - tt m = 0.7, n = 1, Q rms = 18^K and VL h h 2 = 0.03. This choice is 
motivated by numerous other studies which combined CMB data with other 
cosmological probes (e.g. Jaffe et al. 2000, Bridle et al. 2000; Hu et al. 2000). 
We then investigate the constraints on the remaining parameter, the dimension- 
less Hubble constant, h = Hq/(100 kms _1 Mpc _1 ). Increasing h decreases the 
height of the first acoustic peak, and makes few other significant changes to the 
angular power spectrum (e.g. Hu et al. 2000). The range in h investigated here 



6 



O. Lahav 




Figure 1. The Boomerang data (top panel, calibrated by 1.10) 
and Maxima data (bottom panel, calibrated by 0.96). The line in each 
panel is for a A-CDM model with n = l,fi m = 1 — A = 0.3, Q^h 2 = 
0.03, Q rms = 18fiK, and our 'best fit' h = 0.8. 
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Figure 2. Analysis with raw (uncalibrated) Boomerang & Maxima 
data. The top-left panel is for the individual % 2 of each data sets (eq. 
3), while the top-right panel is for the sum of x 2 (Eq. 1). The bottom- 
left panel is the Hyper-Parameters probability with the prior i-*(ln a) = 
1 (Eq. 16), and the bottom left panel is for the prior P(a) = 1 (Eq. 
19). 



is (0.5 < h < 1.1). The results using conventional \ 2 (Eqs. 1 and 3) are shown 
in Table 1, and with the HPs approach (Eq. 16) in Table 2. The full likelihood 
functions are given in Figures 2 and 3. We see that the raw (uncalibrated) B&M 
data give two distinct values in the standard \ 2 analysis. The HPs approach on 
the raw data suggests that B carries 4.5 times more weight than M (the ratio 
of the HPs), for this particular choice of model and parameter space, yielding a 
best h = 0.88. However, the calibration of the data (as described in the caption 
to Table 1) brings the two data sets to much better agreement (e.g. the ratio 
of the B/M HPs is now 1.3). In fact, in this case the standard joint % 2 and the 
HPs (for two different choices of priors; Eqs. 16 and 19) give the same result, 
h = 0.79, with slightly smaller error bars in the HPs case (±0.04; 95% CL). This 
best fit model is shown in Figure 1. We also tried the BBN value Q b h 2 = 0.019 
(last entries in Table 1 and 2), which we can see gives much poorer x 2 than the 
value Qbh 2 = 0.03 (as also suggested by Jaffe et al. 2000 and others). 
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Calibrated M(dashed) , B(solid) 
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Figure 3. Analysis with calibrated B&M data (by 1.10 and 0.96 in 
^r-, respectively). The panels are as in Figure 2. Note the good agree- 
ment between the two data sets, and between the joint x 2 approach 
and the HPs approach (for the two different priors). 



Data 


N 


Best h 


x 2 


BOOMERANG (raw) 


12 


0.90 


4.4 


MAXIMA (raw) 


10 


0.76 


7.1 


B&M (raw) 


22 


0.85 


19.8 


BOOMERANG (calibrated by 1.10) 


12 


0.80 


8.8 


MAXIMA (calibrated by 0.96) 


10 


0.79 


8.9 


B&M (calibrated) 


22 


0.79 


17.7 


B&M (calibrated, BBN) 


22 


0.72 


35.7 



Table 1. Conventional \ 2 analysis using each B and M data set 
alone, and both sets combined. Results are given for 'raw' (uncali- 
brated) data, and for calibration of AT/T by factors of 1.10 and 0.96 
for B&M respectively, as explained in the text. For each data set the 
number of data points, N, the best fit value of h and the x 2 value at 
this point are given. The full likelihood distributions in h are shown 
in Figures 2 and 3. Other parameters are fixed for a A-CDM model at 
n m = 0.3, A = 1 - n m = 0.7, n = 1, Q rms = 18/xK and n b h 2 = 0.03. 
For the last entry Q b h 2 = 0.019 (BBN value). 
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Data 


N 


Best h 


Effective HP 


B&M (raw) 


22 


0.88 


2.7(B); 0.6 (M) 


B&M (calibrated) 


22 


0.79 


1.4(B); 1.1 (M) 


B&M (calibrated, BBN) 


22 


0.73 


0.5(B); 0.7 (M) 



Table 2. The results of the Hyper-Parameters analysis (Eq. 16) 
The data sets are as described in Table 1. Shown are the number of 
data points N in each data set, the best fitting value of h, and the 
effective HP (N/x 2 ) at this h. Other parameters were held fixed as 
described in Table 1. 

4. Discussion 

We have presented a formalism for analysing a combination of measurements, 
when it is likely that different systematics (or methods for calculating random 
errors) may affect each data set differently. By using a Bayesian analysis, and by 
using a specific 'non-informative' prior for the 'Hyper-Parameters' (P(ln a) = 1), 
we find that for M data sets one should minimise 

M 

- 2 lnP(w|data) = £ Nj H X 2 ), (20) 

3=1 

where Nj is the number of measurements in data set j = 1, M. It is as easy to 
calculate this statistic as the standard x 2 ■ The corresponding HPs a e gj = Nj/xj 
provide useful diagnostics on the reliability of different data sets. We emphasize 
that a low HP assigned to an experiment does not necessarily mean that the 
experiment is 'bad', but rather it calls attention to look for systematic effects or 
better modelling. 

In L2000 we analysed pre-B&M data and found that while the standard 
X 2 approach gave a wide range for Hq, the Hyper-Parameter analysis suggested 
two distinct values of Hq, ~ 50 and ~ 70 km/sec/Mpc. Here we applied the 
method to the B & M data, with and without calibration. The HPs indeed 
'detect' inconsistencies between the two 'raw' data sets, but the calibrated data 
sets show good agreement with each other, as seen in both the x 2 an d the HPs 
statistics. We have also seen in this example that the HPs solution is insensitive 
to the exact choice of prior. 

The best fit Hubble constant is Hq = 79=L4 km/sec/Mpc (95% CL, random 
errors only) for a fixed flat CDM J7 m = 1 — A = 0.3 model with n = 1, Q rms = 
18/xK and Q^h 2 = 0.03. We note that if more cosmological parameters are left 
free and then marginalised over, the error in h would typically be larger (e.g. 
Bond, Bridle in this volume). 

This combination of VL m and Hq corresponds gives for the age of the Universe 
11.9 Gyr. Our derived Hq is slightly higher but still consistent with the 'final 
result' of Hq from Cepheids and other distance indicators (Freedman et al. 2000) 
Hq = 72 ± (3) r ± (7) s km/sec/Mpc (1-sigma random and systematic errors). 

The above analysis can be extended in a number of ways. Current and 
future CMB data can be combined with other cosmological probes (and their 
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corresponding HPs), and more cosmological parameters can be kept free. Here 
we used a simple correction for the calibration error. A more general approach is 
to marginalise over both the HPs and a calibration probability function (Bridle 
et al, in preparation). Two other aspects which can be modified according to 
specific problems are the priors P(atj) and the probability functions P(Dj\w). 
We shall discuss these extensions elsewhere. 

Acknowledgments. I thank my collaborators S. Bridle, M. Hobson, A. 
Lasenby and L. Sodre for their contribution, and to L. Page and J. Mould for 
helpful discussions. 
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